34 research outputs found
SBNet: Sparse Blocks Network for Fast Inference
Conventional deep convolutional neural networks (CNNs) apply convolution
operators uniformly in space across all feature maps for hundreds of layers -
this incurs a high computational cost for real-time applications. For many
problems such as object detection and semantic segmentation, we are able to
obtain a low-cost computation mask, either from a priori problem knowledge, or
from a low-resolution segmentation network. We show that such computation masks
can be used to reduce computation in the high-resolution main network. Variants
of sparse activation CNNs have previously been explored on small-scale tasks
and showed no degradation in terms of object classification accuracy, but often
measured gains in terms of theoretical FLOPs without realizing a practical
speed-up when compared to highly optimized dense convolution implementations.
In this work, we leverage the sparsity structure of computation masks and
propose a novel tiling-based sparse convolution algorithm. We verified the
effectiveness of our sparse CNN on LiDAR-based 3D object detection, and we
report significant wall-clock speed-ups compared to dense convolution without
noticeable loss of accuracy.Comment: 10 pages, CVPR 201
Scaling Forward Gradient With Local Losses
Forward gradient learning computes a noisy directional gradient and is a
biologically plausible alternative to backprop for learning deep neural
networks. However, the standard forward gradient algorithm, when applied
naively, suffers from high variance when the number of parameters to be learned
is large. In this paper, we propose a series of architectural and algorithmic
modifications that together make forward gradient learning practical for
standard deep learning benchmark tasks. We show that it is possible to
substantially reduce the variance of the forward gradient estimator by applying
perturbations to activations rather than weights. We further improve the
scalability of forward gradient by introducing a large number of local greedy
loss functions, each of which involves only a small number of learnable
parameters, and a new MLPMixer-inspired architecture, LocalMixer, that is more
suitable for local learning. Our approach matches backprop on MNIST and
CIFAR-10 and significantly outperforms previously proposed backprop-free
algorithms on ImageNet.Comment: 30 pages, tech repor